Kernel Density Estimation for Text-Based Geolocation

نویسندگان

  • Mans Hulden
  • Miikka Silfverberg
  • Jerid Francom
چکیده

Text-based geolocation classifiers often operate with a grid-based view of the world. Predicting document location of origin based on text content on a geodesic grid is computationally attractive since many standard methods for supervised document classification carry over unchanged to geolocation in the form of predicting a most probable grid cell for a document. However, the grid-based approach suffers from sparse data problems if one wants to improve classification accuracy by moving to smaller cell sizes. In this paper we investigate an enhancement of common methods for determining the geographic point of origin of a text document by kernel density estimation. For geolocation of tweets we obtain a improvements upon non-kernel methods on datasets of U.S. and global Twitter content.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of the Gamma kernel and the orthogonal series methods of density estimation

The standard kernel density estimator suffers from a boundary bias issue for probability density function of distributions on the positive real line. The Gamma kernel estimators and orthogonal series estimators are two alternatives which are free of boundary bias. In this paper, a simulation study is conducted to compare small-sample performance of the Gamma kernel estimators and the orthog...

متن کامل

Mapping Web Pages by Internet Protocol (IP) addresses: Analyzing Spatial and Temporal Characteristics of Web Search Engine Results

Internet Protocol (IP) addresses are frequently used as a method of locating web users by researchers in several different fields. However, there are competing reports concerning the accuracy of those locations, and little research has been done in manually comparing the IP geolocation databases and web page geographic information. This paper categorized web page from the Yahoo search engine in...

متن کامل

Identification of Hazardous Situations using Kernel Density Estimation Method Based on Time to Collision, Case study: Left-turn on Unsignalized Intersection

The first step in improving traffic safety is identifying hazardous situations. Based on traffic accidents’ data, identifying hazardous situations in roads and the network is possible. However, in small areas such as intersections, especially in maneuvers resolution, identifying hazardous situations is impossible using accident’s data. In this paper, time-to-collision (TTC) as a traffic conflic...

متن کامل

Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks

We propose a method for embedding twodimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical dialectology. Evaluated over Twitter data, the proposed model outperforms conventional regression-based geolocation and provides a better estimate of uncer...

متن کامل

Anomaly Detection and Modeling of Trajectories

The recent boom in the availability and use of geolocation technologies has created a great need to understand datasets of trajectories. Moreover, trajectory data is collected in a wide range of different domains including: meteorology, zoology, and business. However, trajectories have several intrinsic attributes that make them difficult to analyze. First, their time-series nature makes applyi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015